Chroma
Querying Overview
One way to specify a Chroma query in Qarbine is to use a JSON-like structure. Below is an example to retrieve up to 10 matches from the fruit collection.
{
"collection": "fruit",
"nResults": 10,
"queryTexts": "color”
}
Here is a sample result. By default the ordering is closest distance first.
Shown below are the details of the first element.
Notice that the metadata fields (chapter and verse) are automatically pulled up to the main object level. The raw value of the element’s metadata is
{"chapter": "3", "verse": "16"}
The metadata can be any JSON object and not limited to a single depth either.
Filtering can be applied to queries as well. For example,
{
nResults: 10,
collection: "fruit",
queryTexts: "color",
where: {"chapter": {"$eq": "3"}},
}
Here is a sample result.
The possible main fields of a Chroma answer set element are:
- id,
- document,
- distance,
- metadata and
- embedding.
Your query specification indicates which fields to return. This is discussed below.
Query Specification Options
Primary Options
The primary specification options are described below.
Field | Description |
---|---|
collection | The Chroma collection to perform the query upon. |
nResults | The maximum number of matches to return. A “limit” field can be used as an alias for nResults. The Chroma default is 10. |
queryEmbeddings | An embedding value to be used as criteria. Its size must match the collection! |
queryTexts | The text from which to obtain an embedding value. |
where | The criteria for the metadata filtering. For example, where : {"metadata_field": "is_equal_to_this"}, For more details seehttps://docs.trychroma.com/docs/querying-collections/metadata-filtering |
whereDocument | The criteria for text searching within the element document field. For example, whereDocument : {"$contains":"search_string"} For more details seehttps://docs.trychroma.com/docs/querying-collections/full-text-search |
include | The list of primary element fields to include in the answer set. The recognized keywords are: all, metadatas, distances, documents and embeddings. A “*” indicates “metadatas, distances, documents”. The “all” includes these plus the embeddings. These keywords are case sensitive! |
nearText | The value is a string with the similarity phrase. For example “dracula movies”. An embedding value for the nearText argument will be obtained by Qarbine using a configured Qarbine AI Assistant. When using this option the model used to insert the Chroma data must correspond to the one used by the Qarbine AI Assistant. |
includeMetadatas | The list of metadata fields to include in the result. This trimming takes place AFTER the Chroma answer set is returned to the Qarbine host and before any template processing. |
explain | Pass in true to see the native argument to be sent to Chroma’s query function. For debugging purposes this still includes any collection, includeMetadatas, sortBy and sortBySql values. |
Qarbine Enhanced Interaction Options
SQL Oriented Filtering
Chroma supports semantic (i.e. vector) search and a lexical (i.e. scalar/matching) search. The use of the specification structure described above can be a bit verbose and cumbersome though. To improve readability and productivity when authoring Chroma retrievals, Qarbine provides a SQL oriented option. For example, here is an example of a vector search retrieval for the fruit collection.
{
nResults: 10,
collection: "fruit",
queryEmbeddings: [ 0.1, 0.3, 0.2, …],
}
The Qarbine SQL equivalent is simply
select *
from fruit
where queryEmbeddings(0.1, 0.3, 0.2, …)
limit 10
Note that any SQL list is enclosed in parentheses while one in the specification is enclosed in brackets. That is a subtle nuance across the SQL and JSON syntax standards.
Qarbine’s Chroma integration extends to the filtering features as well. Qarbine is your co-pilot translating SQL-oriented queries into their lower level specification equivalents. In some cases the Qarbine Data Source will have literally just the SQL statement above and nothing more. There are techniques to blend the ease of using SQL along with the powerful features of Chroma within a Qarbine JSON specification object. The table below lists the fields that drive this definition.
JSON Field | Description |
---|---|
sql | The SQL statement can affect all of the primary options listed above. |
sqlWhere | The string can affect all of the primary options listed above except for includeMetadatas and collection. |
sortBySql | The ORDER BY clause specifying how to sort the answer set AFTER it is returned from Chroma. |
Here is a simple example of combining the SQL and query specification approaches. The effective result is the same as the example query specification above.
{
sql: "select * from fruit",
queryTexts: "color",
}
The mapping of the standard SQL clauses to their Chroma equivalents is described below.
Clause | Description |
---|---|
SELECT | The names of the fields to return. Specifying “*” indicates all default, object fields . You can also reference metadata properties. Here are some examples. SELECT chapter, verse SELECT metadata, distance … SELECT chapter, verse, distance … Including the ‘embeddings’ field in the SELECT list overrides the default behavior of not including it in the answer set. Either the singular (distance) or plural (distances) keywords may be used. |
FROM | The name of the Chroma collection. This value sets the “collection” field in the query specification. |
WHERE | See the discussion below. The effect is to generally set the “where” field of the query specification. It can be much easier to specify criteria in this form than the embedded JSON object form. |
ORDER BY | The sorting rules in “column Asc|desc” format. Sorting is done by Qarbine after Chroma returns the answer set. This sets the “sortBySql” field of the query specification. |
LIMIT | Indicates at most how many elements to return. This sets the “nResults” field of the query specification. |
Bear in mind that some combinations of query fields may not make sense in the Chroma world.
Some Qarbine defined SQL functions are listed below.
Function | Description |
---|---|
nearVector | This clause is removed from the WHERE criteria and its list of numbers argument is set into the “queryEmbeddings” field of the query specification. These 2 clauses are equivalent, where nearVector(1, 2, 3) where vector = (1, 2, 3) |
nearText | This clause is removed from the WHERE criteria and its argument is set into the “nearText” field of the query specification. The nearText argument can be used by query.nearText(), hybrid.nearText(), or generate.nearText(). Indicate which operation is wanted in the query specification. |
queryTexts | An alias for nearText. A more natural Chrome function name. |
documentContains | A phrase which is set into the whereDocument field. The value will be of the form {"$contains": phrase} |
documentDoes NotContain | A phrase which is set into the whereDocument field. The value will be of the form {"$not_contains": phrase} |
withOption | Pass in the specification field name and the value to set. This clause is removed from the WHERE clause. |
withOptions | Set several specification fields at once. The format is withOptions(key1, value1, keyN, valueN).The key argument may use dot notation when setting the inner value of a component object. |
The WHERE clause criteria can be in a variety of traditional SQL forms and may include Qarbine specific functions described below. For example,
select * from fruit where nearText("color")
results in a query specification with these fields,
collection: "fruit",
nearText: "color"
This example
select * from fruit
where queryTexts('color') and documentContains('apple')
results in a query specification with,
collection: "fruit",
nearText: "color",
whereDocument: { $contains: "apple"}
This example
{
collection: 'fruit',
sqlWhere: 'queryTexts("color") and chapter = "3"',
nResults: 3
}
results in a query specification with,
{
collection: "fruit",
nResults: 3,
where: { chapter: "3" },
queryTexts: [ "color" ]
}
This example
select * from products
where queryTexts("phone") and price < 2800 and price > 1200
and documentContains("blue")
results in a query specification with,
{
"collection": "products",
"where": {
"$and": [
{ "price": { "$lt": 2800 } },
{ "price": { "$gt": 1200 } }
]
},
"queryTexts": [ "phone" ],
"whereDocument": { $contains: "blue" }
}
Reviewing the Generated Specification
You can enter criteria of the form “EXPLAIN SELECT ….” to have the SQL statement processed and have the returned answer set be the underlying query specification.
A convenient way of specifying this is to have “explain” on the first line and the rest of your SQL on the next lines.
explain
select * from fruit
where chapter = "3" and nearText( ‘color’ )
limit 25
Shown below is the single answer set row.
Then simply “comment out” the first line when not in use
// explain
select *
from fruit
where chapter = "3" and nearText(‘color’)
limit 25
You can also use “explain: true” in the JSON query specification for similar information.
{
nResults: 10,
collection: "fruit",
queryTexts: "color",
where: {"chapter": {"$eq": "3"}},
}
Another way to get the specification is to press ALT and click . Below is a sample result.
Any “explain SELECT” or “explain: true” takes precedence over the ALT-click interaction.
Troubleshooting
A possible query error is
{"error":"Chroma 0.1 runNativeQuerySpecification error,
Collection expecting embedding with dimension of 384, got 1536"}
Another possible error is
When either of these occurs the Qarbine administrator must cross reference the associated Qarbine data service configuration’s embedding service API and embedding model values. This is explained in the Qarbine data service configuration document for Chroma.
References
Information on generative AI service integrations can be found at
Details on configurating collection metadata characteristics can be found at